A Two-Stage Approach for Generating Topic Models
نویسندگان
چکیده
Topic modeling has been widely utilized in the fields of information retrieval, text mining, text classification etc. Most existing statistical topic modeling methods such as LDA and pLSA generate a term based representation to represent a topic by selecting single words from multinomial word distribution over this topic. There are two main shortcomings: firstly, popular or common words occur very often across different topics that bring ambiguity to understand topics; secondly, single words lack coherent semantic meaning to accurately represent topics. In order to overcome these problems, in this paper, we propose a two-stage model that combines text mining and pattern mining with statistical modeling to generate more discriminative and semantic rich topic representations. Experiments show that the optimized topic representations generated by the proposed methods outperform the typical statistical topic modeling method LDA in terms of accuracy and certainty.
منابع مشابه
A New Approach Generating Robust and Stable Schedules in m-Machine Flow Shop Scheduling Problems: A Case Study
This paper considers a scheduling problem with uncertain processing times and machine breakdowns in industriall/office workplaces and solves it via a novel robust optimization method. In the traditional robust optimization, the solution robustness is maintained only for a specific set of scenarios, which may worsen the situation for new scenarios. Thus, a two-stage predictive algorithm is prop...
متن کاملStage specialization for design and analysis of flotation circuits
This paper presents a new approach for flotation circuit design. Initially, it was proven numerically and analytically that in order to achieve the highest recovery in different circuit configurations, the best equipment must be placed at the beginning stage of the flotation circuits. The size of the entering particles and the types of streams including pulp and froth were considered as the bas...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملPresenting a New Model for Bank’s Supply Chain Performance Evaluating with DEA Solution Approach
Data Envelopment Analysis (DEA) is a method for measuring the efficiency of peer decision making units (DMUs) with multiple inputs and outputs. The traditional DEA treats decision making units under evaluation as black boxes and calculates their efficiencies with first inputs and last outputs. This carries the notion of missing some intermediate measures in the process of changing the inputs to...
متن کاملANN-DEA Integrated Approach for Sensitivity Analysis in Efficiency Models
Here, we examine the capability of artificial neural networks (ANNs) in sensitivity analysis of the parameters of efficiency analysis model, namely data envelopment analysis (DEA). We are mainly interested to observe the required change of a group of parameters when another group goes under a managerial change, maintaining the score of the efficiency. In other words, this methodology provides a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013